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ABSTRACT 
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regression models. ( LMO) 
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A Derivation of the Unbiased Standard Error of Estimate: 

the General Case 



Francis J. O'Brien, Jr., Ph.D. 



Introduct i on 

This paper represents the fifth in a series of applied statistics 
monographs (See O'Brien 1982a, 1982b, 1982c, 1983a). The purpose of these 
papers is to provide supplementary reading for appl ied statistics students. 
The intended audience is social science graduate and advanced undergraduate 
students. The minimum background for t.^ost of the existing and forthcoming 
papers is familiarity with elementary analysis of variance, and multiple 
correlation and regression analysis. 

The unique feature of this series is detailed proofs and derivations of 
important formulas and relationships which are not readily available in 
textbooks, journal articles and similar sources. Each proof or derivation is 
presented in a detailed and clear fashion using well defined and consistent 
notation. When necessary, a review of relevant algebra is provided. 
Calculus is not used or assumed. 

The present paper assumes familiarity with two previous papers in this 
series (O'Brien, 1982c, 1983a). Each paper formulated a detailed derivation 
of the multiple correlation formula of one criterion and p predictors for the 
linear model. The first' paper (1982c) presented a derivation of the 

mu 1 1 i p I e R based on 

1 

standard (Z) scores, and the second showed the analogous 

2 

derivation for the raw score model. 



Overview of Derivation 



In the present paper derivations of the unbiased standard error of 
estimate for both the raw score and standard score linear models are 
presented. The derivations will be presented in graduated steps of 
genera I i ty . F i rst the der i vat i on for one cr i ter i on (dependent) var i ab I e and 
one predictor (independent) variable is presentd for the raw score model. A 
der i vat i on for two raw score pred i ctors is then presented . Next, the 
derivation for the three predictor case is formulated. Finally, the 
derivation for any (finite) number of predictors is presented. Derivations 
for the standard score model are then outl ined. 
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Overview of Regression Analysis 



Pr i or to present I ng the der i vat i ons, a br i ef overv i ew 

3 

of regression analysis Mill be given. Let us consider 
the linear regression model for one raw score criterion and 
one predictor. Assume one is attempting to predict one 
criterion with one predictor. We assume that the model 

4 

is linear in form. The mathematical model we might select 
to"fit T ' si<ch a distribution is the simple linear equation: 

A 

Y - a + b X 
1 1 

Wher e- 

Y i ■ the predicted criterion, 
a ■ the slope intercept term, 

b ■ the slope coefficient term, 
1 

x ■ the predictor variable in deviation score form ; i.e., 
1 

x ■ X - X where T is the arithmetic mean. 
Ill 1 

If a scatter diagram were constructed for this hypothetical model (based on 
actual data, of course), the actual raw score observations would in all 
likelihood not fall on the line defined 

A 

by the linear equation of the idealized ma l*hemat i ca I model (Y}< 

A 

Such deviations from Y are considered errors of prediction. We can 
conceive a raw score observation as consisting of a component predicted by the 
model plus an error component. That is: 



8 



3 



Y 

Where: 



A 

Y 



+ e 



Y ■ the actual criterion we want to predict by Y" 

e ■ the amount of numerical error resulting from using 

A 

the idealized mathematical model (Y) to predict the 
actual raw score criterion (Y) . 

That is, an actual dependent (criterion) variable score consists of the 
quantity predicted by the idealized "best fitting" line plus an error 
component . 

The error made in predicting the observed criterion score by the model is 
s i mp I y • 

A 

e « Y - Y 

One of the goals of regression analysis is to minimize the prediction error 
denoted by e above. It can be seen that if e"0, then the actual criterion is 
perfectly predicted by the selected mathematical model. That is to say, the 
s i mp I e I i near 



equation fitted to the observed data points, a + 
predicts every observation (Y) in 



b x , 
1 1 

the distribution. Geometrically, 

A 

when e'O, every Y score falls on the straight line, Y. For this case, the 

values corresponding to a and b can be solved empirically using elementary 
algebra based on the observed data. Rarely, however, do such distributions 
exist in the social sciences. Consequently, we are forced to select 
procedures which will provide computing formulas for calculating the a and b 
terms . 

The technique most often used in the social sciences *o minimize the 
error of prediction is the "least squares" procedure. Essentially, this 
procedure seeks to maximize predictability by minimizing prediction error . 
The least squares criterion or goal is summarized in the following 
5 

express i on: 



I (Y - Y . ) 
i-1 j , 



i»l i 



a mini mum 



If we substitute the quantity for Y previously defined, we can rewrite the 
least squares criterion as: 



ERIC 



2 TV - (a+b x H ■ ^(Y-a-b x ) ■ ■ minimum 

1 1 J 11 

(As an aside, "least squares" means we determine values for a and b such that 
the squared error term results in the least possible value). 

The Standard Error of Estimate 



The standard error of estimate provides a measure of the 

average amount of error that results from using Y for Y score prediction. 
(See Lindeman, et al.). The unbiased standard error of estimate for one 
predictor is defined as follows: 



Y.x 

1 



/w 2 

lvt-r :) 



n-2 



T[Y~(a+b (X - X )1 
L 11 1 J 



n-2 



where: 



S"(Y- a - b x ) 
1 1 

n-2 



Y.x ■ the unbiased standard error of estimate for 

1 one predictor / 

n ■ the sample size. 

Note that the predictor variable (x ) is in deviation form, 

1 

However, the criterion to be predicted (Y) is not tr an sf ormed; nor 

A 

do we transform the predicted criterion (Yj r 

This is the definitional formula for the unbiased standard error of 
estimate. An equivalent formula shown in virtual ly al I 
applied statistics textbooks is as follows: 



10 



Y.x 



■ S 



n-1 (l-r ) 

x y 

n-2 1 



where: 

S 

y ■ the standard deviction of the actual criterion score, 
2 

r ■ the square of the simple Pearson correlation between 
x Y 

1 the predictor in deviation form (x } and the 

cr i ter i on (Y) . 1 



This formula will be derived in this paper. 

In general, the standard error of estimate can be 
obtained for a linear regression model containing any finite number 
of predictors. If we let p represent an indefinite number 
of raw score predictors, the unbiased standard error 
of estimate can be expressed as:: 
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Y.x / ^ / • • • / x , • • • , x ' 
12 j p 



^ 2 
JCY-Y ) 



(n-1) (1- R 



Y.x , x , 
1 2 



., x 



n-(pH) 



where: 



Y.x , x , 
1 2 



the unb i ased standard error 

of estimate for p predictors 
(in deviation score form), 
an indefinite number cf pred i ctors, 



Y.x ,x 



■ the squared linear multiple 
x correlation between one criterion and 

p predictors. 



This formula also will be derived in this paper. 



The standard error of estimate also can be derived for regression models 
in which the variables have been expressed in standard score (Z) form. The 
unbiased sample standard error of estimate for a one predictor standard score 
I i near 



12 



7 

7 

model is defined as • 



Z .Z 
Y 1 



n-2 



where: 



n-2 



Z .Z ■ the standard error of estimate for the 

Y 1 standardized criterion (Z ) and the 

Y 

standardized predictor (Z ), 

1 

n ■ the samp I e s i ze, 

A " the slope intercept term, 

Z ■ the standardized predictor, 

1 

B ■ the beta (regression) weight. 



the prediction error. 



We show that the definitional formula above is equal to: 



Z .Z 
Y 1 



> 



n-1 (1-r ) 

Z ,Z 

n-2 Y 1 
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where: 2 

r ■ the squared correlation of Z and Z 

Z , Z Y 1 

Y 1 



For standard score variables, the unbiased standard error of 
estimate for p predictors is: 



Z . Z , Z 0 m • • ^ Z , . . • Z 
Y 1 2 j ■ P 

where: 



n-1 



n-CP+1) 



1 - R 



Z • Z , Z , « « « , Z , « « « , z 
Y 1 2 j p. 



s 

Z .Z , Z , ...,Z , ...,Z ■ unbiased standard error of estimate 

7 12 j p f or p pred i ctors, 

2 

R ■ squared mu 1 1 i p I e correl at i on 

Z .Z ,Z , ...,Z ,...,Z between the criterion (Zy) 

Y 1 2 j p and p standardized predictors. 



In this paper we will concentrate on the standard error of estimate 
for the raw score model. The derivations for the Z score model will 
be outlined. The reader may wish to work out the derivations 
for the standard score model using the detailed presentations for 
the raw score model as a guide. 



Derivations for Raw Score Model 



In the next several sections, we will show the derivations of the 
unbiased standard error of estimate for raw scores. We begin with the 
simplest case of one criterion and one predictor. 

Derivation for One Predictor 



For the readers convenience in working through the algebra, we will 
summarize relevant definitions and formulas. This is done in Table 1. 
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Table 1 



Basic Sample Descriptive Statistics for One Predictor 
Raw Score Model 









Regression Model : Y 


"a + bx "Y + r 


S 


1 1 yl 


y x 






S 1 






1 


2 






S 


I(Y-Y) 

■ 


2 


Variance of Y: y 


I s 




n-1 


n-l 


2 


2 


2 


Var i ance of X : S 


T (x-x ) 




1 X 


i i 


u i 


1 








n-l 


n-l 


b 






Correlation of 


*" l 




x and y 
1 




r 






yi 


(n-l) S S 






Y 1 





Note: All summations range from i B l to i B n observations, 
a 

This is derived from the least squares criterion ; i .e., 



n ^ 2 n 2 n 2 

^V(Y - Y ) " H C Y ~ a " b x ) * IE e " minimum 
i »1 i f i -1 i 11 i -1 i 

See O f Brien, 1983a, p. 44 
b 

See O'Brien, 1983a, for justification that the numerator in the 
correlation formula may be given as: 

Y~x Y, y or Tx Y, where x - X - X and y «Y-Y. 

U 1 1 1 111 

In this paper, we will use the correlation expression 

15 
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r (or r ) 
Ml 92 



We begin by repeating the definition of the unbiased standard error 
S 

of estimate: 



Y.xl • 



Y 



n-2 



Substituting for Y . : 



Y.x 



T(Y - a - b x ) 
U 11 

\| —1 



It will be easier if we work with the variance erfor of estimate, 
This is simply the square of the standard error of estimate: 



Y.x ■ y (Y - a - b x } 

1 U 11 

It was shown by the author that the slope intercept term, a, is 

equal to the criterion mean, Y (See O'Brien, 1983a, p. 44). 
Making that substitution and rearranging terms: 



16 



11 



Y.x 



Let us express (Y-Y} in deviation score form to simplify the 
algebra: y ■ Y-T. This gives us: 

2 

S 2 
Y.x - y Cyb x ) 

1 u 1 1 



n-2 

Squaring out the terms inside parentheses for this binomial 
expressi on: 



2 

S 2 2 2 

Y.x - y*(y + b x - 2yb x ) 

1 11 ii 



n-2 



17 



12 



Bringing the summation operator inside end factoring constants 

out&ide the summation operator (recall that b functions 

1 

as constant to be estimated in the regression model): 



2 

S 2 2 2 

Y.x • 
1 



( £ y + b T x -2b V x y) 
l i i u l 



n-2 



Substituting the following expressions (see Table 1) : 



ERIC 



2 2 

fa - (n-i) s 

y 



2 ■ 2 

Cn-l)S 

1 



b « S 

1 r y_ 



Thus: 



2 

S 

Y.x - 1 
n-2 



yi 

S (based on substitution from 

1 Table 1 and O'Brien, 1983a, p. 44) 



5 x Y ■ (n-l)r S S 

l yl y l 



[2 2 2 

(n-l)S + (r S / S ) (n-l)S 
y yi y i l 

-2(r S /S (n-l)r S S ) 

yi y i yi y i J 



Factoring out the (n-1) term: 

18 
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Y.x 

1 

Si mp I i f y i n g : 
2 

S 

Y.x 



Cn-1) 



(n-2) «- 



(n-2) L 
2 

Cn-1) " 



(n-2) U 



2 2 2 2 2 

+ r (S /S } S - 2r S 
yl y 1 1 yl y 



2 2 2 2 
+ r S - 2r S 

yl y yi y 



2 2 
r S 

yi y 



(n-0 f-i" .1 

L yl J 

Cn-2) U J 



Taking the (positive) square root / the unbiased standard error of estimate 
for one raw score predictor i s: 



s 




S 




2 " 




Y.x 


■ 


y 




L " r 


END DF PROOF 


1 




V 


(n-2) L 

r 


yl. 





J9 



14 



Derivation for Two Predictors 



In this section we seek to show that the unbiased standard 
error of estimate for two raw score predictors is: 



Y .x , x 
1 2 



where: 
S 

y 

2 

R 

Y.x x 
1 2 



(n-1) 
Cn-3) 



1 - R 



.x , x 
1 2 J 



the observed criterion standard deviation, 
squared 

the/ml/lti'pre correlation between the criterion and 
the two raw score pred i ctors (in dev I at i on score 
f orm) 



We begin with the definition of the unbiased standard error 
of estimate for two raw score predictors: 



Y.x , x 
1 2 



n-Cp+1) 



- A 



n-3 



T"(Y -a - b x - b x ) 
112 2 

n-3 



As in the one predictor derivation, it will be easier to work with the 
variance error of estimate: 



20 



s : 

Y.x ,x • 5~( Y " 0 " b x " b x ) 

12 *— 112 2 

n-3 



Substituting Y for the slope intercept term and rearranging: 



2 

S 

Y .x , x 
1 2 



T[(Y-7) - b x - b x ) r 

U 11 2 2 J 



n-3 



Now, expressing Y-Y in deviation form and expanding the 
trinomial expression: 



2 

S —,.222 22 

Y.x,x ■ 1 Njy + bx + bx 

1 2 L - L 11 2 2 

n-3 -2yb x - 2yb x + 2b b x x J 

11 22 1 2 1 2 J 



Bringing the summation operator inside and factoring constants: 



21 



16 



2 

S 



Y.x , x • ( JJy + b y*x +b £~ x -2b Tx a -2b Tx y 

1 2 n-3 1^1 2 C 2 1 1 2*- 2 



+ 2b b T*x x ) 
1 2 1 2 



The following formulas can be used for simplification: 
2 2 

a 

_ 2 2 
Vx • (n-l)S 

1 1 

2 2 
*- 2 2 



*- l 



■ (n-l)r S S 
yi 9 1 

S~x y • Cn-l)r S S 

'— 2 y2 y 2 



^x x 
*- 1 2 



(n-l)r S S 
12 1 2 



For easy reference, these formulas are summarized in 
Table 2. 
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Table 2 

Substitution Equations for Two Predictor Raw Score Model 



h 2 


2 

- (n-l)S 

y 


2 

I* 

1 


2 

- (n-l)S 

1 


2 

I" 

*- 2 


2 

- (n-l)S 

2 


9 

1 


- (n-l)r S S 

yi y i 


*- 2 


" (n-l)r S S 

y2 y 2 


\ X X 

^ 1 2 


- (n-l)r S S 
12 1 2 



Note: equations are expressed in deviation score form. 
Each equation Is based on algebraic rearrangements for 
basic sample descriptive statistics (compare Table 1) • 
For example, the variance of Y is: 

2 



S - rCY-Y) /(n-1) - V"y /(n-l) 

y 

2 2 2 

Solving in terms of ^y : ^~"y ■ (n-l)S 
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18 



Making these substitutions: 



Y .x , x 
1 2 



n-3 



E 



2 2 2 2 2 

■1)S + (n-l)b S + (n-l)b S 
y 11 2 2 



-2(n-l)b r S S - 2(n-l)b r S S 
1 yl y 1 2 y2 y 2 



+ 2(n-l)b 



b r S S 1 
1 2 12 1 2j 



Factoring out the (n-1) term and rearranging: 
2 



Y .x , x 
1 2 



(n-1) 
(n-3) 



2 2 2 2 2 

S +(bS +bS +2bbrSS) 
y 12 22 12 yly2 



-2(b r SS + br S3) 
1 yl y 1 2 y2 y 2 



i 



The next step is very important. The two terms in parentheses 
reduce to functions of the squared multiple R for two predictors. 
As was shown in the author T s 1983a paper, the derivation of 
R for two predictors results in several equivalent ways to express 

2 2 
R or R . Table 3 shows forms of R which will be used in the 
next step. (Compare CTBrien, 1983a, pages 12-18, especially p. 18), 
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Table 3 

2 a 
Functions of R for Two Raw Score Predictors. 



2 2 2 2 

bS +bS +2bbrSS brSS+brSS 
2 11 22 12 12 12 1 yl y 1 2 y2 y 2 

R ■ 



2 2 
S S 

y y 



Rearrang i ng: 



2 2 2 2 2 2 

RS - bS +bS +2bbrSS - brSS+brSS 
y 11 22 12 12 12 lylyl 2y2y2 



2 2 
Note= R - R 

Y.x ,x 
1 2 

a 

See O'Brien, 1983a. 
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20 



Thus, 



2 2 2 2 2 2 

R S ■ bS + bS +2bbrSS 

y 11 22 12 12 12 

br S S + br SS 
1 yl Y 1 2 i)2 y 2 



Making these substitutions 

r, 



Y .x , x 
1 2 



n-1 



n-3 



2 2 2 12 
S + S R - 2S R 
a 9 b i 



n-1 1 2 

S * 

n-3 



n-1 2 

S 

n-3 y 



1- R 



Y .x , x 
1 2 



Taking the positive square root, the unbiased standard error of estimate for 
two raw score predictors is: 



Y .x , x 
1 2 



(n-1) 
(n-3) 



1- R 



Y . x , x 
1 2 



END DF PRDDF 



Derivation for Three Predictors 



Prior to showing the derivation for the general case of p predictors, we 
will present the derivation for the three predictor model. This allows us to 
review the logic and procedures of the derivation. In addition, we introduce 
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summation notation throughout all of the steps of the derivation which 
simplifies the algebra for the general case. 

For three raw score predictors, we will show that: 



Y .x , x , x 
12 3 



n-1 



n-4 



R 



Y .x , x , x 
12 3 



We begin by presenting the definition of the unbiased standard error 
of estimate for three predictors: 



Y .x , x , x 
12 3 



E( Y - Y 2 

n-Cp+l) 



7"(Y-a-bx - bx - bx) 
1 1 2 2 3 3 

n-4 



As before, we will work with the variance error of estimate: 



27 



22 



7 (Y - o - b x - bx - b x ) 
*— 1 1 2 2 3 3 



Y .x , x , x 
12 3 



n-4 



ERIC 



Proceeding as before, we first replace a with Y and 
express Y - Y as y: 



V (Y-Y) - b x - b x - b x ) 
l — L 1 1 2 .2 3 3 J 



2 r 12 

s 

Y .x , x , x 
12 3 



n-4 

12 



> jy-bx - bx - bx 
L - I 11 22 3 3 J 



n-4 



Expanding this quadr i nom i a I expression: 



2 

S ■ 1 ,— ["2 

Y .x , x , x — — / y 
12 3 n-4 < ~L 

2 2 2 2 2 2 

+ bx + bx + bx 
1 1 2 2 3 3 

- 2yb x - 2yb x - 2yb x 
1 1 2 2 3 3 

+2bbxx +2bbxx +2bb 
1212 1313 23 

Bringing the summation operator inside: 



x x 
2 3 J 



28 



23 



Y .x , x , x 
12 3 



n-4 



2 



+ b 



> x + b y x + b y 

11 2 L - 2 3 *- 



- 2b y~ x y - 2b V x y - 2b V x y 
11 2^2 3^3 

t 2b b T* x x t 2b b .Tx x t 2b b V x x 1 
1212 1 3 L 1 3 2 3^ 2 3 J 



The following substitution formulas stated in general form Mill help us to 
simplify the above expression (see Table 4 for reference): 



For any x 



(n-D 



(n-l)[X 



For any y. %. 

Fx x. 

V J 



For any x x : 
I j 



(n-l)r S S 
yj Y j 



' J 



S S 



>J 1 J 



Applying these substitutions: 
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2 

S 1 r 2 

Y.x ,x ,x ■ (n-l}S 

1 2 3 n-4 I y 

2 2 2 2 2 2 

+ (n-l} b S + (n-1) b S + (n-l)b S 
1 1 2 2 3 3 

- 2(n-l)b r S S - 2(n-l)b r S S - 2(n-l)b r S S 
1 yl y 1 2 y2 y 2 3 y3 y 3 

+ 2(n-l)b b r S S + 2(n-l)b b r S S + 2(n-l)b b r S S 
12 12 12 13 13 13 23 23 23 
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Table 4 a 
Generalized Substitution Equations For Raw Score Model 



2 2 
S - (n-1) £"b 

y 

2 2 
S - (n-10 ) x 

j j 

X. " (n-l)r S S 

y,J yj y j 

T".x x " (n-l)r S S 
i j ij i j 



a 

For example, the second equation applies to any X variable; 
for the jth X variable, the sum of squares is related to the 
jth variance. 
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Factoring out (n-1) and rearranging: 



S n-1 
Y .x , x , x ■ 1 
12 3 n-4 



2 

S 



2 2 2 2 2 2 
+ (b S +b S +b S t 2b b r SS r2b br SS +2bbr S S ) 
11 22 '3 3 12 12 12 13 13 13 23 23 23 



•2(b r SS + br SS + br SS) 
1 yl y 1 2 y2 y 2 3y3y3 



] 



We now express the parenthesized terms in summation notation (see O'Brien, 
1983a): 



Y.x, x , x 
12 3 



n-1 



n-4 



i 



Sy 



3 2 2 3 2 

+ C^T b s + 2"Z I bbr SS) 
j-1 j j j-2 i-1 i j ij i j 



3 n 

2(1 br SS) 

j-i j aj y j J 



Table 5 shows equivalent forms of R for three predictors stated 
in summation notation. 
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Table 5 

2 a 
Functions of R For Three Raw Score Predictors 



3 2 2 3 2 3 

^ b S + 2 H I b b r S S |2 brSS 
2 j"l j j j"2 i-1 i j ij i j j-1 j yj Y j 

R " " 



2 2 
S S 

a b 



Rear rang i ng: 



22 3 22 32 3 

RS ■ £7bS + 2"£TIT bbrSS,, ^L brSS 

y j"i j j j"2 i"i i j ij i j ■ j"i j yj y j 



2 2 
Note: R ■ R 

Y .x , x , x 
a 12 2 

See O'Brien, 1983a 
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Thus: 

2 2 3 2 2 3 2 

RS ■ X! bS ♦ 2 E Z b b p 8 S 

y j"l j j j"2 i-1 i j ij i j 



" ^ b r S S 
j"l j bj b j 



Sobst I tut i ng! 



Y .x , x , x 
1 2 J 



n-1 



n-4 



2 2 2 2 2 

S + S R - 2S R 

b b b 



S i mpl i f y i ng: 
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or 



S 



Y.x , x , x 
12 3 



n-1 S 



n-4 



1 - R 



Y.x , x , x 
12 3 



Therefore, the unbiased standard error of estimate is: 



Y.x , x , x 
12 3 









n-1 11 


} 


END OF PROOF 




Y .x , x , x 1 




n-4*" 


1 2 3 J 





Derivation For p Predictors 



In this section, we show the general form of the unbiased 
standard error of estimate when the regression model contains 
some unknown but finite number of predictors (p) . 
We will follow the same steps in the derivation we used for one, 
two and three predictors. It Mill be seen that the derivation 
for the general case of p predictors is a straightforward 
mu 1 1 i var i ate gener a I i zat i on . 

Formally, we will show that the unbiased standard error of 
estimate for p predictors is: 
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Definitions for terms In the formula were given In the section 
"Overview of Derivation". 

Starting with the definition of the unbiased standard error 
of estimate: 



1 2 j p * 



(Y - Y ) 
n-(p+l) 



b x - . . . -b x - . . . - b x 
2 2 j j p p 



n-(p+l) 



As in the previous derivations, we will work with the variance error o 
est i mate: 



r(Y-a-b x -b x -...-b x -...-b x ) 
1122 j j p p 



1 2 



n-(p+l) 



Now replace a by Y , and express Y"Y in deviation score form: 



Y.x 



\~~ £y-b x -b x - . . . - b x - . . . - b x } 
,x « * " L- 1122 jj pp 



1 2 



n-Cp+1) 
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Expanding this multinomial: 
2 1 



S . X 

Y .x ,x , x , x n- (p+1) 

12 j p 

r 222 2 2 22 22 

(y+b x + b x +...+b x +...+b x 
11 2 2 j j p p 

- 2yb x - 2yb x 2yb x -...-2yb x 

11 2 2 j j p p 

+2bbxx +2bbxx +...+ 2b b x x +...+ 2b b x x ) 
12 12 13 13 i j i j p-1 p p-1 p 

Bringing the summation operator inside: 



Y .x , x , — , x , . . ., x n- (p+1) 
12 j p 



22 22 22 22 

C T"y + b ^~x + b +...+ b y.x +...+ b t"x 

1^-1 2^-2 J J P P 

- 2b K~yx - 2b V~yx -...-2b yx -...-2b %" yx 
1^-1 lL 2 \L- j p^- 



P 



+ 2b b V~x x + 2b b "s" x x + . . .+2b b V~x x +... + 2b b V~x x ) 
1 2 L 1 2 1 3 L 1 3 i j i j p-1 p^- p-1 p 



Using the generalized substitution formulas given in Table 4, we 
can simplify as follows: 
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2 1 

S - X 

Y » x , x t • • • , x , » » » , x n"(p+l) 
12 j p 



[- 



2 2 2 2 2 2 2 

1)S + (n-l)b S + (n-l)b S +...+ (n-l)b $ 
9 11 2 2 p p 



- 2(n-l)b r S S - 2(n-l)b r S S 2(n-l)b r S S 

1 yl a 1 2 y2 y 2 j yj y j 

2(n-l)b r S S 

p ap a p 

+ 2(n-l)b b r S S + 2(n-l)b b r S S +...+ 2(n-l)b b r S S +...+ 
12 12 12 13 13 13 'jijij 

2(r 



[n-l)b b r S S t 

P-1 P P-l- P P-1 P J 



Factoring out (n-1) and rearranging: 



2 n-1 

S - X 

Y.x ,x ,...,x n-(p + l) 

12 j p 

[■: 

22 22 22 22 

+ (b S + b S +...+ b S +...+ b S + 

11 2 2 j j p p 

2b br S3 +2bbr SS +...+ 2b b r S S +...+ 

12 12 12 13 13 13 'j'jij 

2b b r S S ) 

p-1 p p-1, p p-1 p 

- 2(b r SS + br SS +...+ b r S S +...+ b r S S ) 
1 yl y 1 2 y2 y 2 j yj y j p yp y p 

Expressing the terms in parentheses in summation notation: 



] 
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2 n-1 
S « 



Y.x ,x ,...,x n-(p+l) 
12 j p 

' 2 

S 

y 

2 2 p p-1 

+ O b S +2 



:fl b S + 2^^~bbrSS) 
j"l j j j"2 i Mi" i j ij i j 



P 

2 (H b r S S 

j'l j yj y 



2 

Table 6 shows equivalent forms of the multiple R for p 
predictors (see D'Brien, 1983a). 
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Table 6 

2 a 
Functions of R for p Predictors 



2T bS +2^XI bbrSS ^brSY 
2 J 5 ! J J j"2 i"l i j ij i j j-1 j yj Y j 
R ■ 1 - 



2 2 
S S 

y a 



Rearrangi ng: 



2 2 p 2 2 p p-1 p 

RS ■ J b S t Z£ £bbr SS ■ ^ br SS 
y j'l j j j"2 i'l i j ij i j j"l j yj y j 



2 2 
Note: R - R 

Y»X j X j a a a j X ^ a a a ^ X 

a 1 2 j p 

See O'Brien, 1983a 
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Thus: 

2 2 
R S 

y 



p 2 2 p p-1 

^bS ♦ 2 I T b b r S 3 
n j j j"2 1-1 i j Ij I j 



*§Z b r S S 

j = i j aj a j 



Substituting into the variance error of estimate above: 



Y • x / ^ / • • • / ^ / • * * / ^ 
12 j p 



n-1 



n-Cp+1) 



•2 a 2 a 2 

S + R S - 2R S 

y y y 



] 



Y • x i • • • / ^ / • • • < ^ 
12 j p 



n-1 2 T 2 -r 
S 1 - R 

n- Cp+1) y g.x , x , . . ., x ,x 

L 1 2 j pi 



Theref ore: 







n-1 


2 




s 


S 




1 - R 




Y • x / ^ / • • • / ^ / • • • / ^ 


y 


n-(p+l) 


Y • x / ^ / • • • $ ^ / • 




12 j p 


> 




1 2 J 





END OF PROOF 
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Der ivations for Standard Score Mode I 



Introduct i on 



We have presented derivations for the unbiased standard error 
of estimate for the linear raw score model when the number 
of predictors was one, two, three and some finite number, p. 
In this part of the paper we will outline the derivations 
for the standard score model. 

The reader may be aware of the fact that there is 
a s i mp I e re I atfonsh i p between mode I s in raw score form 
and standard score (Z} form. This relationship obviates the 
need for presenting detailed derivations for the Z score model. 
Therefore, we will outline the derivations for the standard 
score model, and leave the proofs as an exercise for the reader* 
We will show the logic behind transforming from the linear 
raw score model to the Z score model. First we take the standardized 
model for one predictor. We then provide an outline 
for generalizing the derivation for the p predictor standard 
score case. 
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Derivation for One Predictor 



Recall the derivation for the one predictor raw scor? model 
The derivation of the standard error of estimate was shown 
to be: 



Y.x 



n-2 1 



Let us now consider the model in standard score form. 
First, recall the following relationships for the Z score 
model (See O'Brien, 1982b for proofs): 



■ 1 



2 2 
r * r 

Z ,Z yl 
Y 1 
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That is, the standard deviation for the raw score 

variable Y is equal to unity when Y is standardized. 

Also, the square of the simple (zero order) 

Pearson correlation when calculated in raw score form 

is identical to the correlation between the same variables 

that have each been standard i zed .Tak i ng these facts into account, 

we can rewrite the raw score standard error of estimate 

for Z scores as follows: 



Z .Z 
Y 1 



Y Y. 



n-2 



y N 



n-1 
n-2 



1-r 

Z ,Z 
U Y 1 





S 

Y.x 

1 
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If one were to extend this logic to the case 

of p standardized predictors, the standard error of estimate 
for p standardized predictors is: 









s s 


n-l 


2 


Z • Z f Z j • • • j Z ^ • • • Z Z 




1 - R 


y i 2 j p y 


r.-(p+J) 


Z • Z , Z ^ • • • i Z j • • • i z 




Y 1 2 j p 


\ 







m 

1 


(n-l) 


2 

A" R 

Y • x / X , • • • y x y • • • j x 
12 j P m 








> 


n -(p+1) 



For the p predictor case S also is equal to 1. 

Z 

H 

It remains to be proved fthat the squared multiple R T s 
are equal to one another.. It can be shown that they 
are equal for p predictors, although this statement is 
not proved in this paper. 
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Dutline for Derivations 



The reader who desires to derive the unbiased 
standard error of estimate for p linear standardized 
predictors may use the following outline as a guide. 
Essentially, the steps parallel those for the raw 
score model* First, the definitional form for 
the standard error of est i mate is stated. 
Substituting the terms of the regression model 
for p predictors is the second step. (See D'Brien, 1983c). 
Third, square the mu 1 1 i nom i a I express i on . Nex+, a 
series of equations are substituted into the squares 
and cross products of the squared multinomial. The 
reader may refer to the author's paper (1983c) for 
the relevant equations. The simplified expression 
is then expressed in summation notation. Functions 
of the multiple squared R are substituted. Upon 
simplification, the result will be the unbiased 
standard error of estimate for the Z score model. 

Many students who work out the derivations for 
the Z score model prefer to work with several 
predictors in succession. This was our approach 
for the raw score model derivations. A careful review 
of the steps used in the raw score derivations 

may be helpful in working through the long tedious algebra. 




Appendix A 

Errata for "A derivation of the sample multiple correlation formula 

for raw scores, ED 235 205 



Page 



Now- (Reads 



Correct to 



10 , footnote , 

3 lines down 

10, footnote, 

4 lines down 

13 

16, footnote, 
last 2 lines 

17, footnote 
24, footnote 1 

29, equation x Y 

P 

30, 3 lines from bottom 
34,2nd equation 



36, 2 lines from 
bottom of text 

38, 2nd equation 



X Y 
n X Y 

var (b , x ) 

2 2 

. . . and simplifying. \ See 
text for details. 

Multiple R 

i ¥ j 

b J.X 

P P 

b b r S S 
2 j 2j 2 j 

= . . . +b r S S 

j yj y j 

mathematical calculus 
2 



X Y 
1 

n X Y 
1 

var(b x ) 
2 2 

See the text for details. 



multiple R 
Omit this. 

b J x 

P P 

b b r S S 
2 p 2p 2 p 

change = to + 

mathematical statistics 
2 



43, last line in text 



2 

S = o 
j 
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* Page nuntoer at top of text. 



2 

S = 1 
j 
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Appendix B 



Discussion of Linear and Non I inear Regression Models 



This appendix will clarify terminology used in two previous papers 
(O'Brien, 1982c, 1983a). Some readers have requested clarification of my use 
of terms "linear" and "nonlinear* as they apply to regression analysis. 

There are two reasons why this should be done. First, the terminology 
and/or notation used in applied social science statistics textbooks and 
similar sources is quite variable. This has the potential for causing 
confusion in students* minds when attempting to read the same subject matter 
in different sources. Second, it is very important to be clear about the 
differences between a linear and nonlinear regression model. As will be seen, 
"truly" nonlinear regression models are not often used in many areas of social 
sc i ence . 

Dur aim in this appendix merely is to clarify the uses of the 
terminology. References are cited at the end of the appendix for readers who 
desire to learn more about nonlinear regression models. 

I believe confusion exists in the use of the terminology for several 
reasons. Perhaps the basic factor relates to what students learn in 
nonstat i st i cal mathematical courses. The terms linear/nonlinear as they 
relate to functions or relationships discussed in mathematics textbooks are 
not used in the same way by statisticians when discussing linear/nonlinear 
regress i on mode I s . 

Consider a simple example of the parabola (or quadratic or second degree 
equat i on) : 



2 



Y ■ f(X) 



3 - X 



-3 < X < 3 
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If this function is plotted on ordinary graphing paper for 
values of X + 3 ; the plot would show a curve opening downward 

with maximum height of 9 Y units at the origin. This function is 
not linear in form because it cannot be expressed in the 
form of a first degree equation: 



Geometrically, a plot of the quadratic function above 
would not reveal a straight line or linear function. 
For these two reasons, the parabola may be thought 
of as a "nonlinear" function. 

Statisticians use the terms linear/nonlinear in 
a different manner. In the statisticians use of the 
terms, the difference between them has more to do 
with the form of the regression parameters (slope t^rms) 
than with the form of the i n dependent or dependent var i abl es. 
In addition, a plot of the raw observed data points 
is not relevant to classifying a regression model as linear 
or non I i near . 

Let us examine some examples. Assume the following regression 
model (adapted from Draper and Smith, p. 264)" 



Y - f(X) 



s 



a + bX 



F • 



exp(b + b X 
1 2 



+ e) 



(1) 



Where: 



F 



the dependent var i abl e, 

the exponentiation operator for the mathematical 

constant, e ■ 2.71828 (approx.), 
parameters to be estimated, 
the i n dependent var i ab I e, 

the stochastic error term (as used in this paper). 



exp 




e 



Note that equation 1 expresses what we have been calling a 
"raw score model"; e.g., for equation 1, we could write: 



A 



F ■ exp(F. + e) . 



Is the model in (1) a linear* or nonlinear regression 
model? We need to examine the terms in (1) to decide. 



Let us now rework equation 1 to render the model linear. 

If we take the natural logarithm of each side of equation 1, 

we obtain: 
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InF 



In 



exp(b + b X + e) 
1 2 



2 

b + b X + e 
1 2 



(2) 



We now redefine the terms in equation 2. Let: 

Y » InF, 
2 

X • X 
Then (2) becomes: 

V » b + b X + e (3) 



Equation 2 has been I i near i zed . Statisticians would call the regression 
model expressed in (3) a linear model despite the fact that the relationship 
between the dependent and independent variables is not one of a straight line 

Draper and Smith offer useful terminology to distinguish (1) from (3). 
The regression model stated in (1) may be referred to as i ntr i ns i ca I I y I i near 
This means that although equation 1 is nonlinear (with respect 
to the parameters b and b ), transformations 



can be made to express the model in a form which is linear (with respect to 
the parameters) . 

To take a second example (also from Draper and Smith), consider the 
following regression model: 



G » 




+ e 



(4) 



Where: 



G 



* 



the dependent variable, 
as in equation 1, 
the parameters, 



exp 

b ,b 

1 2 



* 



* 



■ the i ndependent var i ab I e 



This model is nonlinear (with respect to the parameters). In addition, 
equation 4 cannot be transformed such that the parameters will be linear in 
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form. Draper and Smith refer to such a regression model as 
intrinsically nonlinear . 

Further discussion and examples of linear/nonlinear regression models 
may be found in Kendall and Stuart (1987), Mosteller and Tukey (1977) and Nie, 
et al . (1975). Those references provide additional source material. 
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Notes 



1 

See O'Brien (1983a, Appendix B) for an errata sheet. Page 
references given in the errata pertain to the original pagination 
(i.e., at the top of the page). 

2 

Errata for this paper are given in Appendix A of the present paper. 

3 

Readers who need to review regression analysis theory can refer 
to standard applied statistics textbooks. One that is highly 
recommended for its thoroughness and clarity is by 
Lindeman, Gold and Merenda (1982). A general overview is given 
by Lewis-Beck (1980) . 

4 

See Appendix B for discussion of linear and nonlinear 
regress i on mode I s . 

5 

If it is understood that the summation limits range from the first 
observation (i*l) to the last (i"n), then we can drop the summation 
limits; n refers to the total number of observations for the 
criterion and pr ed i ctor (s) . This sample size is the same 
regardless of the number of predictors. Later when the algebra 
becomes more complex, we use summation limits extensively. 

6 

As mentioned earlier, it is assumed that the reader is familiar 
with the author's 1983a paper. 

7 

The regression model for one standardized predictor is: 



47 



Z A - A + B Z 

Y 11 

The observed standard score model is: 

z - + • 

Y Y Z 



where: 



■ the predicted criterion in standard score form, 

A ■ the s I ope i n'i ercept i erm (not standard i zed" - 

see O'Brien, 1982c) 
Z ■ the standardized predictor; i.e., 



Z - (X -X )/S where S is the 
1111 1 

standard deviation of ^ 

1 

s I ope term (regress i on or beta we i ght) 



e ■ the prediction error. 

Z 

8 

The reader may wonder why we divide by the term, n-2. This term 
represents the "degrees of freedom" for the unbiased standard error 
of estimate for one predictor. 

It can be shown that dividing by the appropraite degrees of freedom 
term makes the sample standard error of estimate unbi ased; i .e ., 
the expected value of the sample standard error of estimate 
equa I s the popu I at i on parameter . 

In general, the degrees of freedom for the unbiased 

standard error of estimate Is: n-(p»>l) , where p ■ the number of 

predictors In the regression model. For one predictor, 

n-(p+l) ■ n-(l + i) ■ n-2. p + 1 arises from the number of parameters 

that can be estimated in any raw score linear 

regress ! on mode I - -p s I ope (b) terms plus the slope i nt ercept term . 

For a good discussion of degr ees of freedom, see the 

classic paper by Helen Walker (1940,1971). See also St I I son (1966). 



9 

An alternate approach to the derivations could be used 
by working with matrix albegra notation. The author 
intends to present the derivations of this paper 
and others in this series in matrix algebra. They 
will be written as part of this series for ERIC. 
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